1. Data

Information about the test:

question description
A1 linear function
A2 3D
A3 arithmetic rules
A4 Easy ineq.
A5 logs
A6 logs
A7 graph translation
A8 sine pi/3
A9 trig.ineq.
A10 trig.identity
A11 period
A12 rational exponents
A13 ellipsoid
A14 limit
A15 series
A16 diff.quotient
A17 graph f'
A18 product rule
A19 quotient rule
A20 (exp)'
A21 (ln(sin))'
A22 hyp.functions
A23 slope tangent
A24 IVT
A25 velocity
A26 int(poly)
A27 int(exp)
A28 Int = 0
A29 int even funct
A31 int(abs)
A32 FtoC algebra
A33 difference vectors
A35 intersect planes
A36 parallel planes
A30 FtoC graph
A34 normal vector

Load the student scores for the test - here we load the 2017 and 2018 ETH Zurich test data:

test_scores
## # A tibble: 3,433 x 38
##    year  class    A1    A2    A3    A4    A5    A6    A7    A8    A9   A10   A11
##    <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1 2017  s21t~     1     0     1     1     1     0     1     0     1     0     1
##  2 2017  s21t~     1     0     1     1     1     1     0     1     1     2     1
##  3 2017  s21t~     1     0     0     0     1     1     1     0     1     1     1
##  4 2017  s21t~     1     0     1     1     1     1     1     1     1     1     1
##  5 2017  s21t~     1     0     1     0     2     0     1     0     1     0     2
##  6 2017  s21t~     0     1     0     0     1     2     0     2     2     2     1
##  7 2017  s21t~     1     0     1     0     2     1     0     1     2     1     0
##  8 2017  s21t~     1     1     1     1     2     1     1     2     2     2     2
##  9 2017  s21t~     1     1     0     1     1     1     1     1     1     1     1
## 10 2017  s21t~     1     0     1     0     0     1     0     0     1     1     1
## # ... with 3,423 more rows, and 25 more variables: A12 <dbl>, A13 <dbl>,
## #   A14 <dbl>, A15 <dbl>, A16 <dbl>, A17 <dbl>, A18 <dbl>, A19 <dbl>,
## #   A20 <dbl>, A21 <dbl>, A22 <dbl>, A23 <dbl>, A24 <dbl>, A25 <dbl>,
## #   A26 <dbl>, A27 <dbl>, A28 <dbl>, A29 <dbl>, A30 <dbl>, A31 <dbl>,
## #   A32 <dbl>, A33 <dbl>, A34 <dbl>, A35 <dbl>, A36 <dbl>

Missing data

The data includes scores of 2 for the “I don’t know” answer option. We replace these with NA:

test_scores <- test_scores %>% 
  mutate(across(starts_with("A"), ~ na_if(., 2))) %>%
  # A few students gave "I don't know" as their answer to every question
  # Here we remove them, since rows with all NAs cause problems for mirt
  filter(!if_all(starts_with("A"), is.na))

Data summary

The number of responses from each class:

test_scores %>% 
  group_by(year, class) %>% 
  tally() %>% 
  gt() %>% 
  data_color(
    columns = c("n"),
    colors = scales::col_numeric(palette = c("Blues"), domain = NULL)
  )
## Warning: The `.dots` argument of `group_by()` is deprecated as of dplyr 1.0.0.
class n
2017
s21t-000-2017 1678
2018
s21t-000-2018 1746

Mean and standard deviation for each item:

test_scores %>% 
  select(-class) %>% 
  group_by(year) %>% 
  skim_without_charts() %>% 
  select(-contains("character."), -contains("numeric.p"), -skim_type) %>% 
  group_by(year) %>% 
  gt() %>% 
  fmt_number(columns = contains("numeric"), decimals = 3) %>%
  data_color(
    columns = c("numeric.mean"),
    colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
  ) %>%
  cols_label(
    numeric.mean = "Mean",
    numeric.sd = "SD"
  )
skim_variable n_missing complete_rate Mean SD
2017
A1 15 0.9910608 0.960 0.195
A2 131 0.9219309 0.333 0.471
A3 77 0.9541120 0.655 0.476
A4 27 0.9839094 0.655 0.476
A5 544 0.6758045 0.692 0.462
A6 362 0.7842670 0.901 0.298
A7 19 0.9886770 0.716 0.451
A8 311 0.8146603 0.583 0.493
A9 613 0.6346841 0.731 0.444
A10 452 0.7306317 0.745 0.436
A11 352 0.7902265 0.729 0.445
A12 216 0.8712753 0.784 0.412
A13 211 0.8742551 0.429 0.495
A14 237 0.8587604 0.653 0.476
A15 269 0.8396901 0.551 0.498
A16 427 0.7455304 0.249 0.433
A17 154 0.9082241 0.649 0.477
A18 468 0.7210965 0.529 0.499
A19 566 0.6626937 0.533 0.499
A20 108 0.9356377 0.776 0.417
A21 408 0.7568534 0.674 0.469
A22 477 0.7157330 0.723 0.448
A23 492 0.7067938 0.584 0.493
A24 326 0.8057211 0.730 0.444
A25 78 0.9535161 0.568 0.495
A26 250 0.8510131 0.938 0.241
A27 590 0.6483909 0.603 0.490
A28 647 0.6144219 0.686 0.464
A29 373 0.7777116 0.624 0.485
A30 145 0.9135876 0.830 0.376
A31 214 0.8724672 0.432 0.495
A32 466 0.7222884 0.315 0.465
A33 57 0.9660310 0.808 0.394
A34 497 0.7038141 0.826 0.379
A35 886 0.4719905 0.688 0.464
A36 830 0.5053635 0.436 0.496
2018
A1 16 0.9908362 0.964 0.186
A2 97 0.9444444 0.300 0.458
A3 37 0.9788087 0.658 0.475
A4 23 0.9868270 0.634 0.482
A5 441 0.7474227 0.659 0.474
A6 269 0.8459336 0.870 0.336
A7 13 0.9925544 0.711 0.453
A8 199 0.8860252 0.561 0.496
A9 549 0.6855670 0.673 0.469
A10 367 0.7898053 0.687 0.464
A11 293 0.8321879 0.700 0.458
A12 127 0.9272623 0.770 0.421
A13 188 0.8923253 0.401 0.490
A14 192 0.8900344 0.645 0.479
A15 220 0.8739977 0.530 0.499
A16 341 0.8046964 0.250 0.433
A17 120 0.9312715 0.648 0.478
A18 372 0.7869416 0.463 0.499
A19 468 0.7319588 0.468 0.499
A20 85 0.9513173 0.762 0.426
A21 336 0.8075601 0.646 0.478
A22 420 0.7594502 0.682 0.466
A23 397 0.7726231 0.558 0.497
A24 260 0.8510882 0.678 0.467
A25 46 0.9736541 0.753 0.431
A26 230 0.8682703 0.929 0.256
A27 531 0.6958763 0.576 0.494
A28 529 0.6970218 0.655 0.476
A29 319 0.8172967 0.633 0.482
A30 121 0.9306987 0.786 0.410
A31 166 0.9049255 0.411 0.492
A32 436 0.7502864 0.272 0.445
A33 63 0.9639175 0.791 0.407
A34 425 0.7565865 0.815 0.389
A35 797 0.5435281 0.612 0.488
A36 741 0.5756014 0.397 0.490

2. Testing assumptions

Before applying IRT, we should check that the data satisfies the assumptions needed by the model. In particular, to use a 1-dimensional IRT model, we should have some evidence of unidimensionality in the test scores.

Local independence

TODO Perhaps just remove this section? See the later section on local independence which is consistent with DeMars p.48

This plot shows the correlations between scores on each pair of items:

item_scores <- test_scores %>% 
  select(-class, -year)

cor_ci <- psych::corCi(item_scores, plot = FALSE)

psych::cor.plot.upperLowerCi(cor_ci)

There are a few correlations that are not significantly different from 0:

cor_ci$ci %>% 
  as_tibble(rownames = "corr") %>% 
  filter(p > 0.05) %>% 
  arrange(-p) %>% 
  select(-contains(".e")) %>% 
  gt() %>% 
  fmt_number(columns = 2:4, decimals = 3)
corr lower upper p
A1-A29 −0.024 0.047 0.519
A24-A29 −0.023 0.055 0.417
A2-A34 −0.021 0.062 0.339
A29-A31 −0.062 0.021 0.331
A1-A2 −0.014 0.051 0.260
A1-A5 −0.014 0.069 0.194
A17-A29 −0.010 0.071 0.146
A2-A25 −0.009 0.067 0.139
A13-A29 −0.007 0.078 0.099
A2-A29 −0.007 0.079 0.098
A1-A28 −0.007 0.086 0.097
A1-A12 −0.003 0.069 0.073
A16-A34 −0.003 0.084 0.070
A29-A30 −0.001 0.079 0.054

The overall picture is that the item scores are well correlated with each other.

Dimensionality

structure <- check_factorstructure(item_scores)
n <- n_factors(item_scores)

Is the data suitable for Factor Analysis?

  • KMO: The Kaiser, Meyer, Olkin (KMO) measure of sampling adequacy suggests that data seems appropriate for factor analysis (KMO = 0.93).
  • Sphericity: Bartlett’s test of sphericity suggests that there is sufficient significant correlation in the data for factor analysis (Chisq(630) = 20512.12, p < .001).

Method Agreement Procedure:

The choice of 1 dimensions is supported by 4 (17.39%) methods out of 23 (Acceleration factor, VSS complexity 1, Velicer’s MAP, RMSEA).

plot(n)

summary(n) %>% gt()
n_Factors n_Methods
1 4
2 1
3 1
4 4
5 1
7 2
12 2
18 1
26 1
28 1
33 1
34 1
35 3
#n %>% tibble() %>% gt()
fa.parallel(item_scores, fa = "fa")

## Parallel analysis suggests that the number of factors =  11  and the number of components =  NA

1 Factor

We use the factanal function to fit a 1-factor model. Note that this function cannot handle missing data, so we set the NA scores to 0 for this analysis.

fitfact <- factanal(item_scores %>%
                      mutate(across(everything(), ~ replace_na(.x, 0))),
                    factors = 1,
                    rotation = "varimax")
print(fitfact, digits = 2, cutoff = 0.3, sort = TRUE)
## 
## Call:
## factanal(x = item_scores %>% mutate(across(everything(), ~replace_na(.x,     0))), factors = 1, rotation = "varimax")
## 
## Uniquenesses:
##   A1   A2   A3   A4   A5   A6   A7   A8   A9  A10  A11  A12  A13  A14  A15  A16 
## 0.96 0.95 0.76 0.80 0.72 0.75 0.79 0.79 0.61 0.66 0.68 0.88 0.91 0.73 0.87 0.81 
##  A17  A18  A19  A20  A21  A22  A23  A24  A25  A26  A27  A28  A29  A30  A31  A32 
## 0.78 0.78 0.78 0.64 0.59 0.61 0.64 0.73 0.95 0.69 0.63 0.65 0.85 0.81 0.79 0.78 
##  A33  A34  A35  A36 
## 0.89 0.86 0.79 0.84 
## 
## Loadings:
##  [1] 0.53 0.50 0.62 0.59 0.57 0.52 0.60 0.64 0.62 0.60 0.52 0.55 0.61 0.60     
## [16]      0.49 0.45 0.45 0.46 0.35 0.31 0.36 0.44 0.47 0.46 0.47      0.38 0.43
## [31] 0.46 0.47 0.33 0.37 0.46 0.40
## 
##                Factor1
## SS loadings       8.27
## Proportion Var    0.23
## 
## Test of the hypothesis that 1 factor is sufficient.
## The chi square statistic is 5393.45 on 594 degrees of freedom.
## The p-value is 0
load <- tidy(fitfact)

ggplot(load, aes(x = fl1, y = 0)) + 
  geom_point() + 
  geom_label_repel(aes(label = paste0("A", rownames(load))), show.legend = FALSE) +
  labs(x = "Factor 1", y = NULL,
       title = "Standardised Loadings", 
       subtitle = "Based upon correlation matrix") +
  theme_minimal()
## Warning: ggrepel: 15 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

load %>% 
  select(question = variable, factor_loading = fl1) %>% 
  left_join(item_info, by = "question") %>% 
  gt() %>%
  data_color(
    columns = contains("factor"),
    colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
  )
question factor_loading description
A1 0.2094343 linear function
A2 0.2320310 3D
A3 0.4938949 arithmetic rules
A4 0.4507755 Easy ineq.
A5 0.5309866 logs
A6 0.5041101 logs
A7 0.4543472 graph translation
A8 0.4582495 sine pi/3
A9 0.6222243 trig.ineq.
A10 0.5862856 trig.identity
A11 0.5687470 period
A12 0.3457532 rational exponents
A13 0.3051984 ellipsoid
A14 0.5208039 limit
A15 0.3575489 series
A16 0.4372204 diff.quotient
A17 0.4670996 graph f'
A18 0.4636871 product rule
A19 0.4712863 quotient rule
A20 0.6026572 (exp)'
A21 0.6398888 (ln(sin))'
A22 0.6213857 hyp.functions
A23 0.5995268 slope tangent
A24 0.5187729 IVT
A25 0.2305446 velocity
A26 0.5537627 int(poly)
A27 0.6075910 int(exp)
A28 0.5956103 Int = 0
A29 0.3846979 int even funct
A30 0.4319645 FtoC graph
A31 0.4631794 int(abs)
A32 0.4687184 FtoC algebra
A33 0.3343483 difference vectors
A34 0.3725708 normal vector
A35 0.4572324 intersect planes
A36 0.4011620 parallel planes
# To proceed with the IRT analysis, comment out the following line before knitting
#knitr::knit_exit()

3. Fitting 2 parameter logistic MIRT model

We can fit a Multidimensional Item Response Theory (mirt) model. From the function definition:

mirt fits a maximum likelihood (or maximum a posteriori) factor analysis model to any mixture of dichotomous and polytomous data under the item response theory paradigm using either Cai's (2010) Metropolis-Hastings Robbins-Monro (MHRM) algorithm.

The process is to first fit the model, and save the result as a model object that we can then parse to get tabular or visual displays of the model that we might want. When fitting the model, we have the option to specify a few arguments, which then get interpreted as parameters to be passed to the model.

fit_2pl <- mirt(
  data = item_scores, # just the columns with question scores
  model = 1,          # number of factors to extract
  itemtype = "2PL",   # 2 parameter logistic model
  SE = TRUE           # estimate standard errors
  )
## 
Iteration: 1, Log-Lik: -55174.300, Max-Change: 1.11769
Iteration: 2, Log-Lik: -53440.531, Max-Change: 0.24450
Iteration: 3, Log-Lik: -53280.158, Max-Change: 0.12660
Iteration: 4, Log-Lik: -53207.382, Max-Change: 0.08141
Iteration: 5, Log-Lik: -53164.273, Max-Change: 0.05506
Iteration: 6, Log-Lik: -53136.182, Max-Change: 0.04470
Iteration: 7, Log-Lik: -53116.411, Max-Change: 0.03939
Iteration: 8, Log-Lik: -53102.191, Max-Change: 0.03652
Iteration: 9, Log-Lik: -53091.778, Max-Change: 0.03240
Iteration: 10, Log-Lik: -53084.093, Max-Change: 0.02883
Iteration: 11, Log-Lik: -53078.411, Max-Change: 0.02504
Iteration: 12, Log-Lik: -53074.180, Max-Change: 0.02185
Iteration: 13, Log-Lik: -53061.970, Max-Change: 0.00565
Iteration: 14, Log-Lik: -53061.822, Max-Change: 0.00437
Iteration: 15, Log-Lik: -53061.712, Max-Change: 0.00387
Iteration: 16, Log-Lik: -53061.388, Max-Change: 0.00164
Iteration: 17, Log-Lik: -53061.378, Max-Change: 0.00124
Iteration: 18, Log-Lik: -53061.374, Max-Change: 0.00105
Iteration: 19, Log-Lik: -53061.364, Max-Change: 0.00029
Iteration: 20, Log-Lik: -53061.364, Max-Change: 0.00018
Iteration: 21, Log-Lik: -53061.364, Max-Change: 0.00018
Iteration: 22, Log-Lik: -53061.362, Max-Change: 0.00016
Iteration: 23, Log-Lik: -53061.362, Max-Change: 0.00014
Iteration: 24, Log-Lik: -53061.362, Max-Change: 0.00012
Iteration: 25, Log-Lik: -53061.362, Max-Change: 0.00007
## 
## Calculating information matrix...

Local independence

We compute Yen’s \(Q_3\) (1984) to check for any dependence between items after controlling for \(\theta\). This gives a score for each pair of items, with scores above 0.2 regarded as problematic (see DeMars, p. 48).

residuals_2pl  %>% as.matrix() %>% 
  corrplot::corrplot(type = "upper")

This shows that most item pairs are independent, with only one pair showing cause for concern:

residuals_2pl %>%
  rownames_to_column(var = "q1") %>%
  as_tibble() %>% 
  pivot_longer(cols = starts_with("A"), names_to = "q2", values_to = "Q3_score") %>% 
  filter(abs(Q3_score) > 0.2) %>% 
  filter(parse_number(q1) < parse_number(q2)) %>%
  gt()
q1 q2 Q3_score
A18 A19 0.6691425

Items A18 and A19 are on the product and quotient rules.

Given that this violation of the local independence assumption is very mild, we proceed using this model.

Model parameters

We then compute factor score estimates and augment the existing data frame with these estimates, to keep everything in one place. To do the estimation, we use the fscores() function from the mirt package which takes in a computed model object and computes factor score estimates according to the method specified. We will use the EAP method for factor score estimation, which is the “expected a-posteriori” method, the default. We specify it explicitly below, but the results would have been the same if we omitted specifying the method argument since it’s the default method the function uses.

test_scores <- test_scores %>%
  mutate(F1 = fscores(fit_2pl, method = "EAP"))

We can also calculate the model coefficient estimates using a generic function coef() which is used to extract model coefficients from objects returned by modeling functions. We will set the IRTpars argument to TRUE, which means slope intercept parameters will be converted into traditional IRT parameters.

coefs_2pl <- coef(fit_2pl, IRTpars = TRUE)

The resulting object coefs is a list, with one element for each question, and an additional GroupPars element that we won’t be using. The output is a bit long, so we’re only showing a few of the elements here:

coefs_2pl[1:3]
## $A1
##                 a         b  g  u
## par     0.9470642 -3.817165  0  1
## CI_2.5  0.7202175 -4.576813 NA NA
## CI_97.5 1.1739108 -3.057518 NA NA
## 
## $A2
##                 a        b  g  u
## par     0.6050046 1.408046  0  1
## CI_2.5  0.5136783 1.184206 NA NA
## CI_97.5 0.6963309 1.631886 NA NA
## 
## $A3
##                a          b  g  u
## par     1.327727 -0.6394136  0  1
## CI_2.5  1.200583 -0.7191701 NA NA
## CI_97.5 1.454871 -0.5596570 NA NA
# coefs_2pl[35:37]

Let’s take a closer look at the first element:

coefs_2pl[1]
## $A1
##                 a         b  g  u
## par     0.9470642 -3.817165  0  1
## CI_2.5  0.7202175 -4.576813 NA NA
## CI_97.5 1.1739108 -3.057518 NA NA

In this output:

  • a is discrimination
  • b is difficulty
  • endpoints of the 95% confidence intervals are also shown

To make this output a little more user friendly, we should tidy it such that we have a row per question. We’ll do this in two steps. First, write a function that tidies the output for one question, i.e. one list element. Then, map this function over the list of all questions, resulting in a data frame.

tidy_mirt_coefs <- function(x){
  x %>%
    # melt the list element
    melt() %>%
    # convert to a tibble
    as_tibble() %>%
    # convert factors to characters
    mutate(across(where(is.factor), as.character)) %>%
    # only focus on rows where X2 is a or b (discrimination or difficulty)
    filter(X2 %in% c("a", "b")) %>%
    # in X1, relabel par (parameter) as est (estimate)
    mutate(X1 = if_else(X1 == "par", "est", X1)) %>%
    # unite columns X2 and X1 into a new column called var separated by _
    unite(X2, X1, col = "var", sep = "_") %>%
    # turn into a wider data frame
    pivot_wider(names_from = var, values_from = value)
}

Let’s see what this does to a single element in coefs:

tidy_mirt_coefs(coefs_2pl[1])
## # A tibble: 1 x 7
##   L1    a_est a_CI_2.5 a_CI_97.5 b_est b_CI_2.5 b_CI_97.5
##   <chr> <dbl>    <dbl>     <dbl> <dbl>    <dbl>     <dbl>
## 1 A1    0.947    0.720      1.17 -3.82    -4.58     -3.06

And now let’s map it over all 32 elements of coefs:

tidy_2pl <- map_dfr(coefs_2pl[1:32], tidy_mirt_coefs, .id = "Question")

A quick peek at the result:

tidy_2pl
## # A tibble: 32 x 7
##    Question a_est a_CI_2.5 a_CI_97.5  b_est b_CI_2.5 b_CI_97.5
##    <chr>    <dbl>    <dbl>     <dbl>  <dbl>    <dbl>     <dbl>
##  1 A1       0.947    0.720     1.17  -3.82    -4.58     -3.06 
##  2 A2       0.605    0.514     0.696  1.41     1.18      1.63 
##  3 A3       1.33     1.20      1.45  -0.639   -0.719    -0.560
##  4 A4       1.28     1.16      1.40  -0.604   -0.684    -0.525
##  5 A5       1.03     0.906     1.16  -0.634   -0.756    -0.513
##  6 A6       1.01     0.854     1.18  -2.20    -2.50     -1.90 
##  7 A7       1.36     1.23      1.49  -0.904   -0.992    -0.815
##  8 A8       1.03     0.920     1.14  -0.286   -0.375    -0.197
##  9 A9       1.59     1.41      1.76  -0.489   -0.577    -0.401
## 10 A10      1.47     1.31      1.63  -0.687   -0.779    -0.596
## # ... with 22 more rows

And a nicely formatted table of the result:

gt(tidy_2pl) %>%
  fmt_number(columns = contains("_"), decimals = 3) %>%
  data_color(
    columns = contains("a_"),
    colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
  ) %>%
  data_color(
    columns = contains("b_"),
    colors = scales::col_numeric(palette = c("Blues"), domain = NULL)
  ) %>%
  tab_spanner(label = "Discrimination", columns = contains("a_")) %>%
  tab_spanner(label = "Difficulty", columns = contains("b_")) %>%
  cols_label(
    a_est = "Est.",
    b_est = "Est.",
    a_CI_2.5 = "2.5%",
    b_CI_2.5 = "2.5%",
    a_CI_97.5 = "97.5%",
    b_CI_97.5 = "97.5%"
  )
tidy_2pl %>% 
  mutate(qnum = parse_number(Question)) %>% 
  ggplot(aes(
    x = a_est,
    y = b_est
  )) +
  geom_errorbar(aes(ymin = b_CI_2.5, ymax = b_CI_97.5), width = 0, alpha = 0.5) +
  geom_errorbar(aes(xmin = a_CI_2.5, xmax = a_CI_97.5), width = 0, alpha = 0.5) +
  geom_text_repel(aes(label = Question), alpha = 0.5) +
  geom_point() +
  theme_minimal() +
  labs(x = "Discrimination",
       y = "Difficulty")

tidy_2pl %>% 
  write_csv("data-eth/OUT_2pl-results-pre-only.csv")

Comparing years and classes

Do students from different programmes of study have different distributions of ability?

Differences between years

Compare the distribution of abilities in the year groups (though in this case there is only one).

ggplot(test_scores, aes(F1, fill = as.factor(year), colour = as.factor(year))) +
  geom_density(alpha=0.5) + 
  scale_x_continuous(limits = c(-3.5,3.5)) +
  labs(title = "Density plot", 
       subtitle = "Ability grouped by year of taking the test", 
       x = "Ability", y = "Density",
       fill = "Year", colour = "Year") +
  theme_minimal()

Differences between classes

Compare the distribution of abilities in the various classes.

ggplot(test_scores, aes(x = F1, y = class, colour = class, fill = class)) +
  geom_density_ridges(alpha = 0.5) + 
  scale_x_continuous(limits = c(-3.5,3.5)) +
  guides(fill = FALSE, colour = FALSE) +
  labs(title = "Density plot", 
       subtitle = "Ability grouped by class of taking the test", 
       x = "Ability", y = "Class") +
  theme_minimal()
## Picking joint bandwidth of 0.187

Information curves

Test information curve

plot(fit_2pl, type = "infoSE", main = "Test information")

Item information curves

plot(fit_2pl, type = "infotrace", main = "Item information curves")

Response curves

Test response curves

plot(fit_2pl, type = "score", auto.key = FALSE)

Item response curves

We can get individual item surface and information plots using the itemplot() function from the mirt package, e.g.

mirt::itemplot(fit_2pl, item = 1, 
               main = "Trace lines for item 1")

We can also get the plots for all trace lines, one facet per plot.

plot(fit_2pl, type = "trace", auto.key = FALSE)

Or all of them overlaid in one plot.

plot(fit_2pl, type = "trace", facet_items=FALSE)

An alternative approach is using ggplot2 and plotly to add interactivity to make it easier to identify items.

# store the object
plt <- plot(fit_2pl, type = "trace", facet_items = FALSE)
# the data we need is in panel.args
# TODO - I had to change the [[1]] to [[2]] since the plt has two panels for some reason, with the one we want being the 2nd panel
plt_data <- tibble(
  x          = plt$panel.args[[2]]$x,
  y          = plt$panel.args[[2]]$y,
  subscripts = plt$panel.args[[2]]$subscripts,
  item       = rep(colnames(item_scores), each = 200)
) %>%
  mutate(
    item_no = str_remove(item, "A") %>% as.numeric(),
    item    = fct_reorder(item, item_no)
    )

head(plt_data)
## # A tibble: 6 x 5
##       x     y subscripts item  item_no
##   <dbl> <dbl>      <int> <fct>   <dbl>
## 1 -6    0.112        201 A1          1
## 2 -5.94 0.118        202 A1          1
## 3 -5.88 0.124        203 A1          1
## 4 -5.82 0.131        204 A1          1
## 5 -5.76 0.137        205 A1          1
## 6 -5.70 0.144        206 A1          1
plt_gg <- ggplot(plt_data, aes(x, y, 
                          colour = item, 
                          text = item)) + 
  geom_line() + 
  labs(
    title = "2PL - Trace lines",
    #x = expression(theta),
    x = "theta",
    #y = expression(P(theta)),
    y = "P(theta)",
    colour = "Item"
  ) +
  theme_minimal() +
  theme(legend.position = "none")

ggplotly(plt_gg, tooltip = "text")
knitr::knit_exit()